In this paper we introduce a Bayesian framework for solving a class ofproblems termed Multi-agent Inverse Reinforcement Learning (MIRL). Compared tothe well-known Inverse Reinforcement Learning (IRL) problem, MIRL is formalizedin the context of a stochastic game rather than a Markov decision process(MDP). Games bring two primary challenges: First, the concept of optimality,central to MDPs, loses its meaning and must be replaced with a more generalsolution concept, such as the Nash equilibrium. Second, the non-uniqueness ofequilibria means that in MIRL, in addition to multiple reasonable solutions fora given inversion model, there may be multiple inversion models that are allequally sensible approaches to solving the problem. We establish a theoreticalfoundation for competitive two-agent MIRL problems and propose a Bayesianoptimization algorithm to solve the problem. We focus on the case of two-personzero-sum stochastic games, developing a generative model for the likelihood ofunknown rewards of agents given observed game play assuming that the two agentsfollow a minimax bipolicy. As a numerical illustration, we apply our method inthe context of an abstract soccer game. For the soccer game, we investigaterelationships between the extent of prior information and the quality oflearned rewards. Results suggest that covariance structure is more importantthan mean value in reward priors.
展开▼